🇹🇷 Turkish Aspect-Based Sentiment Analysis (ABSA) – BiLSTM + Word2Vec

This model performs aspect-based sentiment analysis (ABSA) on Turkish sentences. Given a sentence and a specific aspect, it predicts the sentiment polarity (Negative, Neutral, Positive) associated with that aspect.

🧠 Model Details

Model Type: BiLSTM (Bidirectional Long Short-Term Memory) + Word2Vec
Developer: Sengil
Library: Keras
Input Format: "Sentence [ASP] Aspect"
Labels: 0 = Negative, 1 = Neutral, 2 = Positive
Training Dataset: Sengil/Turkish-ABSA-Wsynthetic

📊 Evaluation Results

The model achieved the following performance on the test set:

Class	Precision	Recall	F1-Score	Support
Negative	0.89	0.91	0.90	896
Neutral	0.70	0.64	0.67	140
Positive	0.92	0.92	0.92	1178
Overall			0.90	2214

Overall Accuracy: 90%
Macro-Averaged F1-Score: 83%
Weighted-Averaged F1-Score: 90%

🚀 Usage Example

Download model from HF

from huggingface_hub import hf_hub_download
import pickle
from tensorflow.keras.models import load_model

model_path = hf_hub_download(repo_id="Sengil/Turkish-ABSA-BiLSTM-Word2Vec", filename="absa_bilstm_model.keras")
tokenizer_path = hf_hub_download(repo_id="Sengil/Turkish-ABSA-BiLSTM-Word2Vec", filename="tokenizer.pkl")

# load model
model = load_model(model_path)

# load tokenizer
with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)

Input preprocessing

import re
import nltk
nltk.download('punkt')

def preprocess_turkish(text):
    text = text.lower()
    text = re.sub(r"http\S+|www\S+|https\S+", "<url>", text)
    text = re.sub(r"@\w+", "<user>", text)
    text = re.sub(r"[^a-zA-Z0-9çğıöşüÇĞİÖŞÜ\s]", " ", text)
    text = re.sub(r"(.)\1{2,}", r"\1\1", text)
    text = re.sub(r"\s+", " ", text).strip()
    return text

Predict the input

import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences

def predict_sentiment(sentence, aspect, max_len=84):
    input_text = sentence + " [ASP] " + aspect
    cleaned = preprocess_turkish(input_text)
    tokenized = tokenizer.texts_to_sequences([cleaned])
    padded = pad_sequences(tokenized, maxlen=max_len, padding='post')
    
    pred = model.predict(padded)
    label = np.argmax(pred)
    labels = {0: "Negatif", 1: "Nötr", 2: "Pozitif"}
    return labels[label]

run

sentence = "Manzara sahane evet ama servis rezalet."
aspect = "manzara"

predict = predict_sentiment(sentence, aspect)
print("predict:", predict)

🏋️‍♀️ Training Details

Embedding: Word2Vec (dimension: 100)
Model Architecture:
- Embedding layer (initialized with pre-trained Word2Vec weights)
- 2 x BiLSTM layers (each with 100 units, dropout: 0.3)
- Conv1D layer (100 filters, kernel size: 5)
- Global Max Pooling
- Dense layer (16 units, ReLU activation)
- Output layer (3 units, softmax activation)
Training Parameters:
- Loss Function: sparse_categorical_crossentropy
- Optimizer: Adam
- Epochs: 35 (with early stopping)
- Batch Size: 128
- Learning Rate: 1e-3 (adjusted dynamically with ReduceLROnPlateau)

📚 Training Data

The model was trained on the Sengil/Turkish-ABSA-Wsynthetic dataset, which comprises semi-synthetic Turkish sentences annotated for aspect-based sentiment analysis, particularly in the restaurant domain.

⚠️ Limitations

Performance on the Neutral class is lower compared to other classes, possibly due to class imbalance in the training data.
The model may struggle with rare or ambiguous aspects not well represented in the training set.
Complex sentence structures or ironic expressions may affect the model's accuracy.

📄 Citation

@misc{turkish_absa_bilstm_word2vec,
  title  = {Turkish Aspect-Based Sentiment Analysis using BiLSTM + Word2Vec},
  author = {Sengil},
  year   = {2025},
  url    = {https://huggingface.co/Sengil/Turkish-ABSA-BiLSTM-Word2Vec}
}

📬 Contact

For questions or feedback, please reach out via Hugging Face profile.

Sengil
/

Turkish-ABSA-BiLSTM-Word2Vec